Can We Get A Better Retrieval Function From Machine?
نویسندگان
چکیده
The quality of an information retrieval system heavily depends on its retrieval function, which returns a similarity measurement between the query and each document in the collection. Documents are sorted according to their similarity values with the query and those with high rank are assumed to be relevant. Okapi BM25 and their variations are very popular retrieval functions and they seem to be the default retrieval function for the IR research community; and there are many other widely used and well studied functions, for example, Pivoted TFIDF and INQUERY. Most of these retrieval functions being used today are made based on probabilistic theories and they are adjusted in real world according to different contexts and information needs. In this paper, we propose the idea that a good retrieval function can be discovered by a pure machine learning approach, without using probabilistic theories and knowledge-based techniques. Two machine learning algorithms, Support Vector Machine (SVM) and Genetic Programming (GP) are used for retrieval function discovery, and GP is found to be a more effective approach. The retrieval functions discovered by GP might be hard for human interpretation, but their performance is superior to Okapi BM25, one of the most popular functions. The new retrieval function is combined with query expansion techniques and the retrieval performance is improved significantly. Based on our observations in the empirical study, the GP function is more reliable and effective than Okapi BM25 when query expansion techniques are used.
منابع مشابه
An improved radial basis function neural network for object image retrieval
Radial Basis Function Neural Networks (RBFNNs) have been widely used for classification and function approximation tasks. Hence, it is worthy to try improving and developing new learning algorithms for RBFNNs in order to get better results. This paper presents a new learning method for RBFNNs. An improved algorithm for center adjustment of RBFNNs and a novel algorithm for width determination ha...
متن کاملImageSaker: A Semantic-based Image Retrieval System Refining with Concept Model
In this demonstration, a two-level system for semantic-based image retrieval is proposed. To overcome the shortcoming of the traditional retrieval system, we present a novel method which can provide effective retrieval result in a short time. Firstly, it uses surrounding text to get a related candidate image set. Secondly, a semantic network is used to map the keyword to one of concept models w...
متن کاملEverything Gets Better All the Time, Apart from the Amount of Data
The paper first addresses the main issues in current content-based image retrieval to conclude that the largest factors of innovations are found in the large size of the datasets, the ability to segment an image softly, the interactive specification of the user’s wish, the sharpness and invariant capabilities of features, and the machine learning of concepts. Among these everything gets better ...
متن کاملTBM Tunneling Construction Time with Respect to Learning Phase Period and Normal Phase Period
In every tunnel boring machine (TBM) tunneling project, there is an initial low production phase so-called the Learning Phase Period (LPP), in which low utilization is experienced and the operational parameters are adjusted to match the working conditions. LPP can be crucial in scheduling and evaluating the final project time and cost, especially for short tunnels for which it may constitute a ...
متن کاملFast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine
We introduce a type of Deep Boltzmann Machine (DBM) that is suitable for extracting distributed semantic representations from a large unstructured collection of documents. We propose an approximate inference method that interacts with learning in a way that makes it possible to train the DBM more efficiently than previously proposed methods. Even though the model has two hidden layers, it can b...
متن کامل